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Abstract 

A nearest-neighbor-interchange (NNI) walk is 
a sequence of unrooted phylogenetic trees, 
To , Ti , T2 , . . . where each consecutive pair of 
trees differ by a single NNI move. We give 
tight bounds on the length of the shortest NNI- 
walks that visit all trees in an subtree-prune- 
and-regraft (SPR) neighborhood of a given tree. 
For any unrooted, binary tree, T, on n leaves, 
the shortest walk takes 0{n'^) additional steps 
than the number of trees in the SPR neighbor- 
hood. This answers Bryant's Second Combina- 
torial Conjecture from the Phylogenetics Chal- 
lenges List, the Isaac Newton Institute, 2011, 
and the Penny Ante Problem List, 2009. 

1 Introduction 

Evolutionary histories, or phylogenies, are essen- 
tial structures for modern biology JO]. Finding 
the optimal phylogeny is NP-hard, even when we 
restrict to tree-like evolution [HI |T2]. As such, 
heuristic searches are used to search the vast 
set of all trees. There are many search tech- 
niques used (see [15] for a survey), but most 
rely on local search. That is, at each step in 
the search, the next tree is chosen from the 
"neighbors" of the current tree. A popular way 
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to define neighbors is in terms of the subtree- 
prune-and-regraft (SPR) metric (defined in Sec- 
tion [2|. For a given unrooted tree on n leaves, 
or taxa, the SPR-neighborhood is the number 
of trees that are differ by a single SPR move. 
The number of trees in the SPR neighborhood 
is (2n - 6)(2n - 7). The second "Walks on 
Trees" conjecture of Bryant [5, 14 focuses on 
efficiently traversing this neighborhood via the 
nearest-neighbor-interchange (NNI) tranforma- 
tions (defined in Section [2|. Bryant asks: 

An NNI-walk is a sequence 
Ti , T2 , . . . , T/c of unrooted binary 
phylogenetic trees where each consec- 
utive pair of trees differ by a single 
NNI. 

i. [Question] What is the shortest NNI 
walk that passes through all binary 
trees on n leaves? 

ii. [Question] Suppose we are given a 
tree T. What is the shortest NNI walk 
that passes through all the trees that 
lie at most one SPR (subtree prune and 
regraft) move from T? 

Bryant's conjectures were posed as part of 
the New Zealand Phylogenetic Meetings' Penny 
Ante Problems [5] as well as the Challenges prob- 
lems from the most recent Phylogenetics Meeting 
at the Isaac Newton Institute [14 . 

We answer the second question, proving that 
the shortest walk takes 0(n^) more steps than 
the theoretical minimum that visits every tree 
exactly once (that is, a Hamiltonian path). This 
builds on past work [6^ that showed that a Hamil- 
tonian path was not possible. 
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Figure 1: The trees on the left and center differ by , 
single SPR move from the center tree. 



single NNI move. The tree on the right differs by 



2 Background 

This section includes definitions and results that 
we use from Allen & Steel [1 . For a more de- 
tailed background on mathematically phyloge- 
netics, see Semple & Steel [13]. 

Definition 1. An unrooted binary phylogenetic 
tree (or more briefly a binary tree) is a tree whose 
leaves (degree 1 vertices) are labelled bijectively 
by a (species) set S, and such that each non- 
leaf vertex is unlabelled and has degree three. We 
let UB{n) denote the set of such trees for S = 
{l,...,n}. 

Each internal edge, e of a tree T G UB{n) 
yields a natural bipartition, or split of the taxa; 
We write A \ B \i there is an edge which parti- 
tions the leaf set into the two sets A and B. We 
use the standard notation of T4 to refer to the 
smallest subtree of T containing leaves only from 
A. 

Figure [1] shows several binary trees. Each edge 
of a tree induces a split of the leaf set S. The 
Nearest Neighbor Interchange (NNI) is a distance 
metric introduced independently by DasGupta 
et al [7 and Li et al. [11 . Roughly, an NNI 
operation swaps two subtrees that are separated 
by an internal edge in order to generate a new 
tree [T]. 

Definition 2. Allen and Steel ^Tj: Any inter- 
nal edge of an unrooted binary tree has four sub- 
trees attached to it. A nearest neighbor inter- 
change (NNI) occurs when one subtree on one 
side of an internal edge is swapped with a sub- 



tree on the other side of the edge, as illustrated 
in Figure^ 

Definition 3. The NNI distance, 

dNNi{Ti^T2), between two trees Ti and T2 
is defined as the minimum number of NNI 
operations required to change one tree into the 
other. 

The complexity of computing the NNI dis- 
tance was open for over 25 years, and was proven 
to be NP-complete by Allen and Steel [J. For 
a tree with n uniquely labeled leaves, there are 
n — 3 internal branches. Thus, there are 2(n — 3) 
NNI rearrangements for any tree. 

One of the most popular moves used to 
search treespace is the Subtree-Prune-and- 
Regraft (SPR). Roughly, an SPR move prunes 
a selected subtree and then reattaches it on an 
edge selected from the remaining tree (see Fig- 
ure [l]). 

Definition 4. Allen and Steel fJ^: A subtree 
prune and regraft (SPR) on a phylogenetic 
tree T is defined as cutting any edge and thereby 
pruning a subtree, t, and then regrafting the sub- 
tree by the same cut edge to a new vertex ob- 
tained by subdividing a pre-existing edge in T-t. 
We also apply a forced contraction to maintain 
the binary property of the resulting tree (see Fig- 



Definition 5. The SPR distance^ 

dsPR{Ti^T2), between two trees is the min- 
imal number of SPR moves needed to transform 
the first tree into the second tree. 
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Figure 2: Left: The SPR-neighbor of a 7-leaf caterpillar tree. The highlighted nodes show the trees 
in the orbit that prunes a leaf from one of the sibling pairs. We note that any A/" A/"/- walk that visits 
every tree in this SPR neighborhood will visit some trees more than once. Right: The orbit of the edge, 
e = 1, 2, 3, 4 I 5, 6, 7, 8, 9, 10 from the tree (1, (2, (3, (4, (5, ((6, 7), (8, (9, 10)))))))) shown with respect to the 
tree. The tree is shown in the background with edge e highlighted. An SPR move is determined both by 
the edge pruned and the target edge of the regrafting. The trees in the orbit (red dots) are shown relative 
the regrafting edge in the initial tree. The blue lines indicate NNI-edges in the orbit. We note that the 
edges adjacent to e yield the initial tree when used as the target edge, so, do not produce a new tree. 



The calculation of SPR distances has been 
proven NP-complete for both rooted and un- 
rooted trees [H |9]. Approximation algorithms 
for SPR on rooted trees exist [2[ [3] . 

Definition 6. Let Tq be an unrooted, bi- 
nary tree. Define Nspr{To) to be the SPR- 
neighbor hood o/Tq, namely, 

Nspr{To) = {T I dsPR{To,T) = 1} 

When the tree in question is obvious, we will 
drop the argument and call the neighborhood 

^SPR- 

Definition 7. Let Tq be an unrooted, binary tree 
and and S be a set of trees that are 1 SPR move 
from To. Define Nnni{S,Tq) to be the NNI- 
neighbors of S, namely, 

Nnni{S, To) = {T I 3r e S, dNNiiT, T) = 1 
and dsPR{To,r) = l} 

We note that for every subset S of the SPR 
neighborhood of To, 5 C N{S). 

A "sibling pair" or "cherry" in a tree are two 
leaves that have the same parent. A "caterpillar 
tree" refers to the unrooted tree with exactly 2 
sibling pairs. 



3 Results 

Theoretically, the shortest walk of the SPR 
neighborhood would be if each tree could be 
visited exactly once (that is, a "Hamiltonian 
path"). In ^, this was shown to be impossible 
for n > 6. This was done by showing that in the 
SPR neighborhood of caterpillar trees, there are 
at least 4 'isolated triangles' on the outer edge 
of the neighborhood (see Figure [2| that force at 
least two trees to be visited twice. 

To bound the number of steps needed to visit 
the SPR neighborhood, we first introduce a new 
concept of an orbit of an edge. The orbit is all 
trees created by breaking that edge (in either 
direction) in an SPR move. More formally: 

Definition 8. Define for each edge e of the tree 
Tq, the orbit of e, Oe, to be all the trees that are 
one SPR move from Tq where the edge broken by 
the SPR move is e. 

In Figure [2j the SPR-neighbohood, as well as 
orbits, of the 7-taxa caterpillar tree are shown. 

To calculate the size of the SPR neighborhood 
of a tree, Allen and Steel (proof of Theorem 2.1, 
[1 ) characterized the relationship between the 
trees in the neighborhood. 
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Theorem 9. Allen and Steel 111: Let Tq he an 
unrooted phylogenetic tree on n leaves and let 
^SPR = J^spr{To) be all trees that are a sin- 
gle SPR move from Tq . 

1. The size of the SPR neighborhood is 
\NsPR\=2{n-3){2n-7). 

2. The number of trees in Nspr that can be 
obtained by more than one SPR move from 
To are exactly those from the NNI transfor- 
mations. Thus, there are 2n — 6 of them. 

3. The number of trees in Nspr that can be 
obtained by only one SPR move from Tq is 
4(n-3)(n-4). 

From this, we make the fohowing simple ob- 
servations about orbits: 

Observation 10. Let Tq be an unrooted phylo- 
genetic tree on n leaves: 

1. Every tree T G Nspr belongs to some orbit 
Oe foreeE{To). 

2. Each orbit contains Tq. 

3. Excluding Tq, there are exactly 2n — 6 trees 
that are included in two orbits. 

4. The number of orbits is the number of edges 
in the tree, 2n — 3. 

5. The size of each orbit is 2n — 5. 

The SPR neighborhood is the union of the or- 
bits, but surprisingly, these orbits are mostly dis- 
joint. Roughly, the overlap of orbits is very small 
and they have very few neighbors in common. 
Formally: 

Lemma 11. Let Tq be an unrooted phylogenetic 
tree on n taxa. Let Ti,T2 G Nspr{Tq) and there 
exists e G ^(Tq), Ti, T2 G Oe- Let be the target 
edge of the move that created Ti for i = 1, 2 (that 
is, Ti is formed by grafting some pruned subtree 
ofTo to ei and T2 is the result of grafing a pruned 
subtree to 62). 

Then, Ti and T2 differ by a single NNI move 
if and only if ei and 62 have a common endpoint 
in To. 



Proof. <=: Assume that ei and 62 have a com- 
mon endpoint in Tq. Let M be the subtree 
pruned by the SPR move that creates Ti . With- 
out loss of generality, let the split induced by ei 
in Tq be ABC \ DEM and the split induced by 
62 in To be AB \ CDEM, where A,B,C,D,E, 
and M are the leaves of subtrees of To. Let Tx 
refer to the subtree with leaves only from the set 
X. 

If Tm is pruned to create Ti , then we have that 
Ti contains the splits: ABM \ CDE and AB \ 
MCDE. If Tm is also pruned to create T2, then 
we have that T2 contains the splits: ABCM \ 
DE and ABC \ MDE. Thus, Ti and T2 differ 
by a single NNI move (swapping Tc and Tm), 
and the hypothesis holds. 

So, assume that Tm is not pruned to create 
T2, but instead that e is pruned in the other 
direction. Let N = S — M, where S is the set of 
leaves of T. Since ei and 62 share an endpoint, 
at least one of them must be the edge pruned, 
e. If both are e, then Ti = T2 = T, and the 
hypothesis is trivially true. If only one, say 62, 
is e, then ei must be a neighbor of e in T which 
implies T2 = Ti = T, and again the hypothesis 
is trivially true. 

=^: By assumption Ti and T2 differ by a 
single NNI move. By definition of the NNI 
move, there exists an edge G E{Ti) that 
when removed, breaks Ti into 4 distinct sub- 
trees, Ta^Tb^Tc^Tjj with leaf sets, A^ B^C^ 
and the split AB \ CD belongs to Ti while 
BC I AD belongs to T2. Since both Ti and 
T2 are in the same orbit, the same edge e is 
pruned to create both. We note that since they 
differ by only the NNI move, that, by the ar- 
gument above, the pruning of e must occur in 
the same direction for both to be result in non- 
trivial trees. Further, e must prune one of the 
subtrees: A^ B^C^ D, since only one move is al- 
lowed and Ti and T2 contain exactly the same 
trees. Without loss of generality, assume that A 
is pruned. We note the trees induced by the 
leaves of B^C^D are identical for these trees: 

To\l{bucud) = ^iIl(bucud) = T2\l{bucud)- It 
follows that ei and 62 share a common endpoint. 
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namely the intersection point of B^C^D. □ 

We can immediately give an upper bound on 
the NNI- walk of the SPR neighborhood as O(n^) 
steps. The underlying idea is to traverse each or- 
bit separately, and then link these paths to form 
a traversal of the entire SPR neighborhood: 

Lemma 12. The SPR neighborhood has an 
NNI-walk of length 0{n'^). 

Proof. We will break the NNI-walk of the SPR 
neighborhood into a NNI-walk of the orbit of 
each edge in Tq. Since each orbit contains the 
initial tree Tq, we can glue together the walks of 
the orbit to make a walk of the entire space. We 
note that since each orbit contains at most 2n— 5, 
walking the 2n — 3 orbits in this fashion yields 
a walk with the number of steps is bounded by 
2(2n-5)(2n-3) = 0{n'^). 

It suffices to show that there is a 2-walk of 
each orbit Oe for e G E{To). Each tree, T eOe, 
is created by pruning the edge e in Tq and re- 
grafting the pruned subtree to another edge in 
To (see Figure [2|. Every tree in the orbit cor- 
responds to an edge in Tq (namely, the target 
edge), and the trees in the orbit are connected 
exactly when their target edges share an end- 
point in To by Lemma [TT] Thus, the orbit can 
be traversed by at most 2(2n — 3) steps by start- 
ing at Tq and following a depth-ffist-search of 
the tree (each tree in the orbit is visited at most 
once on the way "down" the search and once on 
the way "up" the search). □ 

To show the lower bound takes more work. It 
follows from this lemma that every orbit has very 
small overlap with the other orbits: 

Lemma 13. Let Ti,T2 G Nspr{Tq) such that 
Ti and T2 are a single NNI move apart. Then 
c^ArAr/(To, Ti), (iArAr/(To, T2) < 2. 

Proof. We note that if there exists an e G E{Tq) 
such that Ti,T2 G Oe, then by Lemma [TT] the 
lemma holds. 

So, let us assume that there exists 61,62 G 
E(To), Ti G Oei, Ti ^ Oe,, Ti G Oe, and 



Ti Oe2- Let Mi be the leaves of the sub- 
tree pruned with 6^ from Tq to create tree T^, 
i = 1,2. Since Ti and T2 are a single NNI move 
apart. By definition, there exists a split in Ti, 
AB I CD that is rearranged in T2: BC \ AD. 

We will argue, by cases, that both Ti and T2 
are within 2 NNI moves of Tq. Without loss of 
generality, we will assume that Mi fi Ta 7^ 

Case 1: Mi C T^. Then, let A' = A - 
Ml. So, we have that Ti contains the split 
A'MiB\CD and T2 contains the split BC \ 
A' MiD. Since Ti is only one SPR move from 
To, the structure of the 2 trees is identical with- 
out Ml, that is, Ti\a'[jbucud = ^oU'ubucud), 
and To includes and edge that splits A' and 
B from C and D. Since T2 does not contain 
such an edge, the move that creates it must 
prune one of Tmi, Ta', Te, Tc, or To. Prun- 
ing T/vfi is not possible since Ti and T2 are 
in different orbits. Pruning Ta' is only pos- 
sible if Tmi and Tb have the same parent, in 
which case the <iArAr/(To, Ti), (i7VAr/(To, T2) = 2 
and the lemma holds. Pruning Tb to create 
T2 means that the subtree Tm^ is on the edge 
separating A' and B from C and D in To and 
dNNi{To,Ti),dNNi{To,T2) = 2. Lastly, pruning 
Tb, Tc, or Td is only possible if To = Ti, in 
which case, c^at at/ (To , Ti ) = 0, dArAr/(To, T2) = 1. 

Case 2: Mi = A. So, we have that Ti con- 
tains the split MiB\CD and T2 contains the 
split BC I MiD. We have three possibilities 
for To, namely, it could contain one of the fol- 
lowing three splits: MiB \ CD, BC \ MiD, 
or BD I MiO. In each of these cases, we have 
<^ArAr/(To, Ti), (i7VAr/(To, T2) < 1 and the lemma 
holds. 

Case 3: Mi D A. So, Mi fl 5 / 0. Since 
Tmi is a subtree of Ti and of T2 , it must contain 
ah of 5. If Ml = A U 5, then the target edges 
in Ti and T2 must separate C and D, and are 
identical. Similarly, if Mi C AU B, Mi must 
contain all of C or all of D, and the taget edges 
in Ti and T2 must preserve the rooting of the 
remaining subtree, and thus, are identical. Thus, 
<^ArAr/(To, Ti), (i7VAr/(To, T2) = 0. 

□ 
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Lemma 14. Let U C Oe be connected consist 
of trees more than 2 NNI moves from Tq. Let 
n = \U\. Then any NNL circuit of U takes at 
least ^{n — 1) steps. 

Proof By induction on the size of \U\. 

For \U\ = 1: This is triviahy true. 

For \U\ > 1, choose x G U closest to Tq. Since 
X is closest to Tq, not all of the neighbors of x 
are in U (if so, then there is an element in U 
closer to To). Since Tq is binary, it has at most 4 
neighbors in Nspr. If x has one neighbor in [/, 
then, a circuit of U must traverse the same edge 
from X to its neighbor twice, and the number 
of steps needed is at least two more than the 
number of steps needed for the smaller set \U\ — 
{x}. By inductive hypothesis, this smaller set 
takes at least ^{\U — {x}\ — 1) steps. So, the 
number of steps for U is: 



\U-{x}\-l)- 



■2>-(|C/| 



1) 



If X has two neighbors in /7, then call the sub- 
trees rooted at x's neighbors, Ui and U2. If the 
neighbors of x are connected, then it takes 3 
steps to visit x in a circuit of x, /7i, and U2. If 
they are not connected, it takes 4 steps. Thus, by 
inductive hypothesis, the number of steps needed 
is: 

^(|f/i|-l) + ^(|t^2|-l)+3>^(|C/|-l) 

If X has 3 neighbors in [/, then by similar ar- 
gument, we have the lower bound. 

If X has 4 neighbors in /7, then it is not the 
closest element of U to Tq, giving a contradiction. 

□ 

From the last two lemmas, we have that the 
orbits are mostly isolated- the only trees hav- 
ing neighbors from outside the orbit are within 
2 steps of Tq. Each of these isolated arms of the 
orbit must be visited in an NNI walk of the SPR 
neighborhood, and the walks of the isolated arms 
take many extra steps. This yields our lower 
bound: 



Lemma 15. It takes Q{n) extra steps to make 
a circuit of an orbit. 

Proof. Let e G E{Tq) and Og its orbit. Since 
each orbit has 2n — 5 trees (Observation [lO| and 
by Lemma 11, at most 8 have neighbors from 
NspR-Oe. 



It follows from Lemma 11 , the 2n — 13 remain- 



ing trees are in two connected sets. By the Pi- 
geonhole Principal, one set has at least n — 7 
trees. By Lemma 14, it takes l](((n— 7) — 1)/2) = 



Q{n) extra steps to visit the larger connected set. 

Thus, it takes Q{n) extra steps to traverse the 
orbit. □ 

The above lemmas can be combined to show 
that 0{n'^) extra steps are needed to traverse the 
neighborhood, since there are 2n — 3 orbits, and 
each has minimal overlap with other orbits. 

Theorem 16. Every SPR neighborhood takes 
(2n — 6)(2n — 7) + 6(n^) steps to traverse. 



Proof. The upper bound follows by Lemma 12 



For the lower bound: by Lemma [T3j every or- 
bit, Oe has Q{n) trees that have no neighbors in 



other orbits. By Lemma 15, it takes Q{n) ex- 
tra steps to traverse these regions of Og. Since, 
by Theorem [9j there are 2n — 3 orbits, we have 
that any path must take > {2n — 3)Q{n) = l](n^) 
extra steps. □ 
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